//$$ newmatb.txt Design Copyright (C) 1991,2,3,4: R B Davies Please treat this as an academic publication. You can use the ideas but please acknowledge in any publication. See also my publication in the proceedings of the second (1994) object-oriented numerics conference published by Rogue-wave Software. Notes on the design of the package ================================== Contents ======== What this is package for What size of matrices? Allow matrix expressions? Which matrix types? What element types? Naming convention Row and Column index ranges Structure of matrix objects Data storage - one block or several Data storage - by row or by column or other Storage of symmetric matrices Element access - method and checking Use iterators? Memory management - reference counting or status variable? Evaluation of expressions - use two stage method? How to overcome an explosion in number of operations Destruction of temporaries A calculus of matrix types Error handling Sparse matrices Complex matrices I describe some of the ideas behind this package, some of the decisions that I needed to make and give some details about the way it works. You don't need to read this file in order to use the package. I don't think it is obvious what is the best way of going about structuring a matrix package. And I don't think you can figure this out with "thought" experiments. Different people have to try out different approaches. And someone else may have to figure out which is best. Or, more likely, the ultimate packages will lift some ideas from each of a variety of trial packages. So, I don't claim my package is an "ultimate" package, but simply a trial of a number of ideas. But I do hope it will meet the immediate requirements of some people who need to carry out matrix calculations. What this is package for ------------------------ The package is used for the manipulation of matrices, including the standard operations such as multiplication as understood by numerical analysts, engineers and mathematicians. A matrix is a two dimensional array of numbers. However, very special operations such as matrix multiplication are defined specifically for matrices. This means that a "matrix" package tends to be different from a general "array" package. I see a matrix package as providing the following 1. Evaluation of matrix expressions in a form familiar to scientists and engineers. For example A = B * (C + D.t()); 2. Access to the elements of a matrix; 3. Access to submatrices; 4. Common elementary matrix functions such as determinant and trace; 5. A platform for developing advanced matrix functions such as eigen- value analysis; 6. Good efficiency and storage management; 7. Graceful exit from errors. It may also provide 8. A variety of types of elements (eg real and complex); 9. A variety of types of matrices (eg rectangular and symmetric). In contrast an array package should provide 1'. Arrays can be copied with a single instruction; may have some arithmetic operations, say + - and scalar + - * /, it may provide matrix multiplication as a function; 2'. High speed access to elements directly and with iterators; 3'. Access to subarrays and rows (and columns?); 6'. Good efficiency and storage management; 7'. Graceful exit from errors; 8'. A variety of types of elements (eg real and complex); 9'. One, two and three dimensional arrays, at least, with starting points of the indices set by user; may have arrays with ragged rows. It may be possible to amalgamate these two sets of requirements to some extent. However my package is definitely oriented towards the first set. Even within the bounds set by the first set of requirements there is a substantial opportunity for variation between what different matrix packages might provide. It is not possible to build a matrix package that will meet everyone's requirements. In many cases if you put in one facility, you impose overheads on everyone using the package. This both is storage required for the program and in efficiency. Likewise a package that is optimised towards handling large matrices is likely to become less efficient for very small matrices where the administration time for the matrix may become significant compared with the time to carry out the operations. It is better to provide a variety of packages (hopefully compatible) so that most users can find one that meets their requirements. This package is intended to be one of these packages; but not all of them. Since my background is in statistical methods, this package is oriented towards the kinds things you need for statistical analyses. What size of matrices? ---------------------- A matrix package may target small matrices (say 3 x 3), or medium sized matrices, or very large matrices. A package targeting very small matrices will seek to minimise administration. A package for medium sized or very large matrices can spend more time on administration in order to conserve space or optimise the evaluation of expressions. A package for very large matrices will need to pay special attention to storage and numerical properties. This package is designed for medium sized matrices. This means it is worth introducing some optimisations, but I don't have to worry about the 64K limit that some compilers impose. Allow matrix expressions? ------------------------- I want to be able to write matrix expressions the way I would on paper. So if I want to multiply two matrices and then add the transpose of a third one I can write something like X = A * B + C.t(); I want this expression to be evaluated with close to the same efficiency as a hand-coded version. This is not so much of a problem with expressions including a multiply since the multiply will dominate the time. However, it is not so easy to achieve with expressions with just + and - . A second requirement is that temporary matrices generated during the evaluation of an expression are destroyed as quickly as possible. A desirable feature is that a certain amount of "intelligence" be displayed in the evaluation of an expression. For example, in the expression X = A.i() * B; where i() denotes inverse, it would be desirable if the inverse wasn't explicitly calculated. Which matrix types? ------------------- As well as the usual rectangular matrices, matrices occuring repeatedly in numerical calculations are upper and lower triangular matrices, symmetric matrices and diagonal matrices. This is particularly the case in calculations involving least squares and eigenvalue calculations. So as a first stage these were the types I decided to include. It is also necessary to have types row vector and column vector. In a "matrix" package, in contrast to an "array" package, it is necessary to have both these types since they behave differently in matrix expressions. The vector types can be derived for the rectangular matrix type, so having them does not greatly increase the complexity of the package. The problem with having several matrix types is the number of versions of the binary operators one needs. If one has 5 distinct matrix types then a simple package will need 25 versions of each of the binary operators. In fact, we can evade this problem, but at the cost of some complexity. What element types? ------------------- Ideally we would allow element types double, float, complex and int, at least. It would be reasonably easy, using templates or equivalent, to provide a package which could handle a variety of element types. However, as soon as one starts implementing the binary operators between matrices with different element types, again one gets an explosion in the number of operations one needs to consider. Hence I decided to implement only one element type. But the user can decide whether this is float or double. The package assumes elements are of type Real. The user typedefs Real to float or double. In retrospect, it would not be too hard to include matrices with complex matrix type. It might also be worth including symmetric and triangular matrices with extra precision elements (double or long double) to be used for storage only and with a minimum of operations defined. These would be used for accumulating the results of sums of squares and product matrices or multistage Householder triangularisations. Naming convention ----------------- How are classes and public member functions to be named? As a general rule I have spelt identifiers out in full with individual words being capitalised. For example "UpperTriangularMatrix". If you don't like this you can #define or typedef shorter names. This convention means you can select an abbreviation scheme that makes sense to you. The convention causes problems for Glockenspiel C++ on a PC feeding into Microsoft C. The names Glockenspiel generates exceed the the 32 characters recognised by Microsoft C and ambiguities result. So it is necessary to #define shorter names. Exceptions to the general rule are the functions for transpose and inverse. To make matrix expressions more like the corresponding mathematical formulae, I have used the single letter abbreviations, t() and i() . Row and Column index ranges --------------------------- In mathematical work matrix subscripts usually start at one. In C, array subscripts start at zero. In Fortran, they start at one. Possibilities for this package were to make them start at 0 or 1 or be arbitrary. Alternatively one could specify an "index set" for indexing the rows and columns of a matrix. One would be able to add or multiply matrices only if the appropriate row and column index sets were identical. In fact, I adopted the simpler convention of making the rows and columns of a matrix be indexed by an integer starting at one, following the traditional convention. In an earlier version of the package I had them starting at zero, but even I was getting mixed up when trying to use this earlier package. So I reverted to the more usual notation. Structure of matrix objects --------------------------- Each matrix object contains the basic information such as the number of rows and columns and a status variable plus a pointer to the data array which is on the heap. Data storage - one block or several ----------------------------------- In this package the elements of the matrix are stored as a single array. Alternatives would have been to store each row as a separate array or a set of adjacent rows as a separate array. The present solution simplifies the program but limits the size of matrices in systems that have a 64k byte (or other) limit on the size of arrays. The large arrays may also cause problems for memory management in smaller machines. Data storage - by row or by column or other ------------------------------------------- In Fortran two dimensional arrays are stored by column. In most other systems they are stored by row. I have followed this later convention. This makes it easier to interface with other packages written in C but harder to interface with those written in Fortran. This may have been a wrong decision. Most work on the efficient manipulation of large matrices is being done in Fortran. It would have been easier to use this work if I had adopted the Fortan convention. An alternative would be to store the elements by mid-sized rectangular blocks. This might impose less strain on memory management when one needs to access both rows and columns. Storage of symmetric matrices ----------------------------- Symmetric matrices are stored as lower triangular matrices. The decision was pretty arbitrary, but it does slightly simplify the Cholesky decomposition program. Element access - method and checking ------------------------------------ We want to be able to use the notation A(i,j) to specify the (i,j)-th element of a matrix. This is the way mathematicians expect to address the elements of matrices. I consider the notation A[i][j] totally alien. However I may include it in a later version to help people converting from C. There are two ways of working out the address of A(i,j). One is using a "dope" vector which contains the first address of each row. Alternatively you can calculate the address using the formula appropriate for the structure of A. I use this second approach. It is probably slower, but saves worrying about an extra bit of storage. The other question is whether to check for i and j being in range. I do carry out this check following years of experience with both systems that do and systems that don't do this check. I would hope that the routines I supply with this package will reduce your need to access elements of matrices so speed of access is not a high priority. Use iterators? -------------- Iterators are an alternative way of providing fast access to the elements of an array or matrix when they are to be accessed sequentially. They need to be customised for each type of matrix. I have not implemented iterators in this package, although some iterator like functions are used for some row and column functions. Memory management - reference counting or status variable? ---------------------------------------------------------- Consider the instruction X = A + B + C; To evaluate this a simple program will add A to B putting the total in a temporary T1. Then it will add T1 to C creating another temporary T2 which will be copied into X. T1 and T2 will sit around till the end of the block. It would be faster if the program recognised that T1 was temporary and stored the sum of T1 and C back into T1 instead of creating T2 and then avoided the final copy by just assigning the contents of T1 to X rather than copying. In this case there will be no temporaries requiring deletion. (More precisely there will be a header to be deleted but no contents). For an instruction like X = (A * B) + (C * D); we can't easily avoid one temporary being left over, so we would like this temporary deleted as quickly as possible. I provide the functionality for doing this by attaching a status variable to each matrix. This indicates if the matrix is temporary so that its memory is available for recycling or deleting. Any matrix operation checks the status variables of the matrices it is working with and recycles or deletes any temporary memory. An alternative or additional approach would be to use delayed copying. If a program requests a matrix to be copied, the copy is delayed until an instruction is executed which modifies the memory of either the original matrix or the copy. This saves the unnecessary copying in the previous examples. However, it does not provide the additional functionality of my approach. It would be possible to have both delayed copy and tagging temporaries, but this seemed an unnecessary complexity. In particular delayed copy mechanisms seem to require two calls to the heap manager, using extra time and making building a memory compacting mechanism more difficult. Evaluation of expressions - use two stage method? ------------------------------------------------- Consider the instruction X = B - X; The simple program will subtract X from B, store the result in a temporary T1 and copy T1 into X. It would be faster if the program recognised that the result could be stored directly into X. This would happen automatically if the program could look at the instruction first and mark X as temporary. C programmers would expect to avoid the same problem with X = X - B; by using an operator -= (which I haven't provided, yet) X -= B; However this is an unnatural notation for non C users and it is much nicer to write X = X - B; and know that the program will carry out the simplification. Another example where this intelligent analysis of an instruction is helpful is in X = A.i() * B; where i() denotes inverse. Numerical analysts know it is inefficient to evaluate this expression by carrying out the inverse operation and then the multiply. Yet it is a convenient way of writing the instruction. It would be helpful if the program recognised this expression and carried out the more appropriate approach. To carry out this "intelligent" analysis of an instruction matrix expressions are evaluated in two stages. In the the first stage a tree representation of the expression is formed. For example (A+B)*C is represented by a tree * / \ + C / \ A B Rather than adding A and B the + operator yields an object of a class "AddedMatrix" which is just a pair of pointers to A and B. Then the * operator yields a "MultipliedMatrix" which is a pair of pointers to the "AddedMatrix" and C. The tree is examined for any simplifications and then evaluated recursively. Further possibilities not yet included are to recognise A.t()*A and A.t()+A as symmetric or to improve the efficiency of evaluation of expressions like A+B+C, A*B*C, A*B.t() [t() denotes transpose]. One of the disadvantages of the two-stage approach is that the types of matrix expressions are determined at run-time. So the compiler will not detect errors of the type Matrix M; DiagonalMatrix D; ....; D = M; We don't allow conversions using = when information would be lost. Such errors will be detected when the statement is executed. How to overcome an explosion in number of operations ---------------------------------------------------- The package attempts to solve the problem of the large number of versions of the binary operations required when one has a variety of types. With n types of matrices the binary operations will each require n-squared separate algorithms. Some reduction in the number may be possible by carrying out conversions. However the situation rapidly becomes impossible with more than 4 or 5 types. Doug Lea told me that it was possible to avoid this problem. I don't know what his solution is. Here's mine. Each matrix type includes routines for extracting individual rows or columns. I assume a row or column consists of a sequence of zeros, a sequence of stored values and then another sequence of zeros. Only a single algorithm is then required for each binary operation. The rows can be located very quickly since most of the matrices are stored row by row. Columns must be copied and so the access is somewhat slower. As far as possible my algorithms access the matrices by row. An alternative approach of using iterators will be slower since the iterators will involve a virtual function access at each step. There is another approach. Each of the matrix types defined in this package can be set up so both rows and columns have their elements at equal intervals provided we are prepared to store the rows and columns in up to three chunks. With such an approach one could write a single "generic" algorithm for each of multiply and add. This would be a reasonable alternative to my approach. I provide several algorithms for operations like + . If one is adding two matrices of the same type then there is no need to access the individual rows or columns and a faster general algorithm is appropriate. Generally the method works well. However symmetric matrices are not always handled very efficiently (yet) since complete rows are not stored explicitly. The original version of the package did not use this access by row or column method and provided the multitude of algorithms for the combination of different matrix types. The code file length turned out to be just a little longer than the present one when providing the same facilities with 5 distinct types of matrices. It would have been very difficult to increase the number of matrix types in the original version. Apparently 4 to 5 types is about the break even point for switching to the approach adopted in the present package. However it must also be admitted that there is a substantial overhead in the approach adopted in the present package for small matrices. The test program developed for the original version of the package takes 30 to 50% longer to run with the current version. This is for matrices in the range 6x6 to 10x10. To try to improve the situation a little I do provide an ordinary matrix multiplication routine for the case when all the matrices involved are rectangular. Destruction of temporaries -------------------------- Versions before version 5 of newmat did not work correctly with Gnu C++. This was because the tree structure used to represent a matrix expression was set up on the stack. This was fine for AT&T, Borland and Zortech C++. However Gnu C++ destroys temporary structures as soon as the function that accesses them finishes. The other compilers wait until the end of the current expression or current block. To overcome this problem, there is now an option to store the temporaries forming the tree structure on the heap (created with new) and to delete them explicitly. Activate the definition of TEMPS_DESTROYED_QUICKLY to set this option. In fact, I suggest this be the default option as, with it, the package uses less storage and runs faster. There still exist situations Gnu C++ will go wrong. These include statements like A = X * Matrix(P * Q); Real r = (A*B)(3,4); Neither of these kinds of statements will occur often in practice. A calculus of matrix types -------------------------- The program needs to be able to work out the class of the result of a matrix expression. This is to check that a conversion is legal or to determine the class of a temporary. To assist with this, a class MatrixType is defined. Operators +, -, *, >= are defined to calculate the types of the results of expressions or to check that conversions are legal. Error handling -------------- The package now does have a moderately graceful exit from errors. Originally I thought I would wait until exceptions became available in C++. Trying to set up an error handling scheme that did not involve an exception-like facility was just too complicated. Version 5 of this package included the beginnings of such an approach. The approach in the present package, attempting to implement C++ exceptions, is not completely satisfactory, but seems a good interim solution. The exception mechanism cannot clean-up objects explicitly created by new. This must be explicitly carried out by the package writer or the package user. I have not yet done this with the present package so occasionally a little garbage may be left behind after an exception. I don't think this is a big problem, but it is one that needs fixing. I identify five categories of errors: Programming error - eg illegal conversion of a matrix, subscript out of bounds, matrix dimensions don't match; Illegal data error - eg Cholesky of a non-positive definite matrix; Out of space error - "new" returns a null pointer; Convergence failure - an iterative operation fails to converge; Internal error - failure of an internal check. Some of these have subcategories, which are nicely represented by derived exception classes. But I don't think it is a good idea for package users to refer to these as they may change in the next release of the package. Sparse matrices --------------- The package does not yet support sparse matrices. For sparse matrices there is going to be some kind of structure vector. It is going to have to be calculated for the results of expressions in much the same way that types are calculated. In addition, a whole new set of row and column operations would have to be written. Sparse matrices are important for people solving large sets of differential equations as well as being important for statistical and operational research applications. I think comprehensive matrix package needs to implement sparse matrices. Complex matrices ---------------- The package does not yet support matrices with complex elements. There are at least two approaches to including these. One is to have matrices with complex elements. This probably means making new versions of the basic row and column operations for Real*Complex, Complex*Complex, Complex*Real and similarly for + and -. This would be OK, except that I am also going to have to also do this for sparse and when you put these together, the whole thing will get out of hand. The alternative is to represent a Complex matrix by a pair of Real matrices. One probably needs another level of decoding expressions but I think it might still be simpler than the first approach. There is going to be a problem with accessing elements and the final package may be a little less efficient that one using the previous representation. Complex matrices are used extensively by electrical engineers and really should be fully supported in a comprehensive package.